AITopics | observable markov game

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Neural Information Processing SystemsDec-24-2025, 11:25:58 GMT

This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system. This paper studies these tasks under the general model of multiplayer general-sum Partially Observable Markov Games (POMGs), which is significantly larger than the standard model of Imperfect Information Extensive-Form Games (IIEFGs). We identify a rich subclass of POMGs---weakly revealing POMGs---in which sample-efficient learning is tractable. In the self-play setting, we prove that a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to find approximate Nash equilibria, correlated equilibria, as well as coarse correlated equilibria of weakly revealing POMGs, in a polynomial number of samples when the number of agents is small. In the setting of playing against adversarial opponents, we show that a variant of our optimistic MLE algorithm is capable of achieving sublinear regret when being compared against the optimal maximin policies. To our best knowledge, this work provides the first line of sample-efficient results for learning POMGs.

Add feedback

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Neural Information Processing SystemsDec-24-2025, 09:37:15 GMT

Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets. Parameter sharing with decentralized execution has been introduced as an efficient way to train multiple agents using a single policy network. However, the nature of resulting equilibria reached by such agents has not been yet studied: we introduce the novel concept of Shared equilibrium as a symmetric pure Nash equilibrium of a certain Functional Form Game (FFG) and prove convergence to the latter for a certain class of games using self-play. In addition, it is important that such equilibria satisfy certain constraints so that MAS are calibrated to real world data for practical use: we solve this problem by introducing a novel dual-Reinforcement Learning based approach that fits emergent behaviors of agents in a Shared equilibrium to externally-specified targets, and apply our methods to a n-player market example. We do so by calibrating parameters governing distributions of agent types rather than individual agents, which allows both behavior differentiation among agents and coherent scaling of the shared policy network to multiple agents.

agent, calibration, shared equilibria, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Learning in two-player zero-sum partially observable Markov games with perfect recall

Neural Information Processing SystemsDec-24-2025, 05:18:06 GMT

We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular EGII under the \textit{perfect-recall} assumption where the only feedback is realizations of the game (bandit feedback). In particular the \textit{dynamics of the EGII is not known}---we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on convergence rate to the NE of order $1/\sqrt{T}$ where~$T$ is the number of played games. Moreover IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory.

learning, name change, observable markov game, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (0.61)
Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games - Supplementary

Neural Information Processing SystemsOct-3-2025, 18:12:56 GMT

B.4 Complete set of experimental results associated to section 4 In this section we display the complete set of results associated to figures shown in section 4. We

bayesian optimization, customer, sequence, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Industry: Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Neural Information Processing SystemsMay-27-2025, 07:43:19 GMT

Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets. Parameter sharing with decentralized execution has been introduced as an efficient way to train multiple agents using a single policy network. However, the nature of resulting equilibria reached by such agents has not been yet studied: we introduce the novel concept of Shared equilibrium as a symmetric pure Nash equilibrium of a certain Functional Form Game (FFG) and prove convergence to the latter for a certain class of games using self-play.

agent, artificial intelligence, machine learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Add feedback

Review for NeurIPS paper: Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Neural Information Processing SystemsJan-27-2025, 05:13:44 GMT

Summary and Contributions: The paper presents the concept of shared equilibrium in certain kinds of multi agent stochastic games with a restricted form of partial observability. The formalism includes the notion of supertypes (different distributions of agents) and types (where each agents is given a true type each episode). The agent's type influences the rewards available as does the joint state of the system and joint action over all agents. One key constraint is that all agents of the same type follow the same policy from an egocentric perspective (where they themselves are the focal agent and all other agents are interchangeable). They define a policy gradient approach for individual agents, also present a higher order learning rule that shifts the distribution over supertypes at a slower timescale.

agent, observable markov game, supertype, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Review for NeurIPS paper: Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Neural Information Processing SystemsJan-27-2025, 05:13:37 GMT

The paper was refereed by 4 knowledgeable reviewers. All reviewers appreciated the contributions of the paper: - Formalization of self play and formal proof when it is guaranteed to converge - New algorithm for calibrating equilibria that is more effective than a naive use of BO. - Convincing results on a market agent scenario. The biggest concern that was discussed between the reviewers was the assumption of the extended transitivity. While this was addressed partially in the rebuttal, the authors should add a longer discussion in the paper for which games this assumption holds. However, after the discussion all reviewers agreed that the paper merits acceptance and I join this decision.

observable markov game, reviewer, shared equilibria, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Neural Information Processing SystemsOct-11-2024, 15:22:34 GMT

This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system. This paper studies these tasks under the general model of multiplayer general-sum Partially Observable Markov Games (POMGs), which is significantly larger than the standard model of Imperfect Information Extensive-Form Games (IIEFGs). We identify a rich subclass of POMGs---weakly revealing POMGs---in which sample-efficient learning is tractable. In the self-play setting, we prove that a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to find approximate Nash equilibria, correlated equilibria, as well as coarse correlated equilibria of weakly revealing POMGs, in a polynomial number of samples when the number of agents is small. In the setting of playing against adversarial opponents, we show that a variant of our optimistic MLE algorithm is capable of achieving sublinear regret when being compared against the optimal maximin policies.

observable markov game, pomg, sample-efficient reinforcement learning, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Charge (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)
(2 more...)

Add feedback

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Neural Information Processing SystemsOct-10-2024, 23:41:21 GMT

Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets. Parameter sharing with decentralized execution has been introduced as an efficient way to train multiple agents using a single policy network. However, the nature of resulting equilibria reached by such agents has not been yet studied: we introduce the novel concept of Shared equilibrium as a symmetric pure Nash equilibrium of a certain Functional Form Game (FFG) and prove convergence to the latter for a certain class of games using self-play.

agent, observable markov game, shared equilibria, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Add feedback

Learning in two-player zero-sum partially observable Markov games with perfect recall

Neural Information Processing SystemsOct-10-2024, 19:28:30 GMT

We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular EGII under the \textit{perfect-recall} assumption where the only feedback is realizations of the game (bandit feedback). In particular the \textit{dynamics of the EGII is not known}---we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on convergence rate to the NE of order 1/\sqrt{T} where T is the number of played games.

learning, observable markov game, perfect recall, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Filters

Collaborating Authors

observable markov game

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Learning in two-player zero-sum partially observable Markov games with perfect recall

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games - Supplementary

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Review for NeurIPS paper: Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Review for NeurIPS paper: Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Learning in two-player zero-sum partially observable Markov games with perfect recall